Learning to Extract Signature and Reply Lines from Email

نویسندگان

  • Vitor R. Carvalho
  • William W. Cohen
چکیده

We describe methods for automatically identifying signature blocks and reply lines in plain-text email messages. This analysis has many potential applications, such as preprocessing email for text-to-speech systems; anonymization of email corpora; improving automatic content-based mail classifiers; and email threading. Our method is based on applying machine learning methods to a sequential representation of an email message, in which each email is represented as a sequence of lines, and each line is represented as a set of features. We compare several state-of-the-art sequential and non-sequential machine learning algorithms on different feature sets, and present experimental results showing that the presence of a signature block in a message can be detected with accuracy higher than 97%; that signature block lines can be identified with accuracy higher than 99%; and that signature block and reply lines can be simultaneously identified with accuracy of higher than 98%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Use of the Shearlet Transform and Transfer Learning in Offline Handwritten Signature Verification and Recognition

Despite the growing growth of technology, handwritten signature has been selected as the first option between biometrics by users. In this paper, a new methodology for offline handwritten signature verification and recognition based on the Shearlet transform and transfer learning is proposed. Since, a large percentage of handwritten signatures are composed of curves and the performance of a sig...

متن کامل

Applying Machine Learning Techniques for Email Reply Prediction

For several years now, email has grown rapidly as the most-used communications tool on the internet. One advantage of the Internet is the ease with which people can communicate online. The popularity of online communication has created an explosion of users who regularly access the internet to connect with others. Many people use email to stay in touch with relatives and friends who live far aw...

متن کامل

Prediction of User Intent to Reply to Incoming Emails

We aim to develop a practical solution to predict whether a user will reply to incoming emails by designing a model that draws on insights derived from classical machine learning algorithms and their basis in statistical methods and convex optimization. Our algorithm is optimized to handle the specific challenge of user email organization by incorporating heuristics about key behaviors and cons...

متن کامل

بررسی اثرات سایتوتوکسیک فراکشن‌های استخراج شده از قارچ صدفی پلئوروتوس فلوریدا بر رده‌های سلولی سرطانی

Received: 3 April, 2009 Accepted: 5 Oct, 2009 Abstract Background & Aim: Nowadays, different medical approaches are used for the treatment of cancers, but in most cases they are not effective or have serious side-effects. This has prompted scientists to look for more effective drugs with less toxicity. This study was to evaluate the cell cytotoxicity effect of fractions isolated from Pleurotu...

متن کامل

تولید خودکار الگوهای نفوذ جدید با استفاده از طبقه‌بندهای تک کلاسی و روش‌های یادگیری استقرایی

In this paper, we propose an approach for automatic generation of novel intrusion signatures. This approach can be used in the signature-based Network Intrusion Detection Systems (NIDSs) and for the automation of the process of intrusion detection in these systems. In the proposed approach, first, by using several one-class classifiers, the profile of the normal network traffic is established. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004